Load Libraries and Import Data

# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(readr)

# Read the CSV files
final_data <- read_csv("data/G6.final.data.csv", )
## Rows: 939 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): ID, location, species, climate
## dbl (4): ecotr_id, tree_id, pear_id, quality_index
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
size_data <- read_csv("data/G6.size.data.csv")
## Rows: 207 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): species, climate
## dbl (28): pear_id, tree_id, ecotr_id, week_0, week_1, week_2, week_3, week_4...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
soil_data <- read_csv("data/G6.soil.data.csv")
## Rows: 144 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): species, climate
## dbl (19): ecotr_id, Bulk Density, Infiltration, Soil Porosity, Soil Depth, W...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Quick preview of data structures
head(final_data)
## # A tibble: 6 × 8
##   ID       location ecotr_id tree_id pear_id species    climate    quality_index
##   <chr>    <chr>       <dbl>   <dbl>   <dbl> <chr>      <chr>              <dbl>
## 1 BE_1_1_1 BE              1       1       1 Conference Scenario 1            64
## 2 BE_1_1_2 BE              1       1       2 Conference Scenario 1            35
## 3 BE_1_1_3 BE              1       1       3 Conference Scenario 1            71
## 4 BE_1_1_4 BE              1       1       4 Conference Scenario 1            62
## 5 BE_1_1_5 BE              1       1       5 Conference Scenario 1            71
## 6 BE_1_1_6 BE              1       1       6 Conference Scenario 1            94
head(size_data)
## # A tibble: 6 × 30
##   pear_id tree_id ecotr_id species    climate week_0 week_1 week_2 week_3 week_4
##     <dbl>   <dbl>    <dbl> <chr>      <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1       1       1        1 Conference Scenar…      0    0.6    0.8    1.2    2.3
## 2       2       1        1 Conference Scenar…      0    0.2    0.4    0.5    0.8
## 3       3       1        1 Conference Scenar…      0    0.6    1.1    1.5    1.5
## 4       4       1        1 Conference Scenar…      0    0.3    1.1    1.3    1.3
## 5       5       1        1 Conference Scenar…      0    0.2    1.3    1.4    2.1
## 6       6       1        1 Conference Scenar…      0    0.5    2      2      2.1
## # ℹ 20 more variables: week_5 <dbl>, week_6 <dbl>, week_7 <dbl>, week_8 <dbl>,
## #   week_9 <dbl>, week_10 <dbl>, week_11 <dbl>, week_12 <dbl>, week_13 <dbl>,
## #   week_14 <dbl>, week_15 <dbl>, week_16 <dbl>, week_17 <dbl>, week_18 <dbl>,
## #   week_19 <dbl>, week_20 <dbl>, week_21 <dbl>, week_22 <dbl>, week_23 <dbl>,
## #   week_24 <dbl>
head(soil_data)
## # A tibble: 6 × 21
##   species    ecotr_id climate    `Bulk Density` Infiltration `Soil Porosity`
##   <chr>         <dbl> <chr>               <dbl>        <dbl>           <dbl>
## 1 Conference        1 Scenario 1           1.33         6.87            37.4
## 2 Conference        1 Scenario 1           1.3          1.87            25.0
## 3 Conference        1 Scenario 1           1.55         3.29            34.7
## 4 Doyenne           1 Scenario 1           1.47         4.17            40.6
## 5 Doyenne           1 Scenario 1           1.52         2.91            30.3
## 6 Doyenne           1 Scenario 1           1.59         7.16            40.9
## # ℹ 15 more variables: `Soil Depth` <dbl>, `Water Holding Capacity` <dbl>,
## #   `Soil Nitrate` <dbl>, `Soil pH` <dbl>, Phosphorus <dbl>, Potassium <dbl>,
## #   Earthworms <dbl>, `Microbial Biomass C` <dbl>, `Microbial Biomass N` <dbl>,
## #   `Particulate Organic Matter` <dbl>, `Mineralizable N` <dbl>,
## #   `Soil Enzymes` <dbl>, `Soil Respiration` <dbl>,
## #   `Total Organic Carbon` <dbl>, pears <dbl>

Data Preparation

# Check for missing values in each dataset
sapply(final_data, function(x) sum(is.na(x)))
##            ID      location      ecotr_id       tree_id       pear_id 
##             0             0             0             0             0 
##       species       climate quality_index 
##             0             0             0
sapply(size_data, function(x) sum(is.na(x)))
##  pear_id  tree_id ecotr_id  species  climate   week_0   week_1   week_2 
##        0        0        0        0        0        0        0        0 
##   week_3   week_4   week_5   week_6   week_7   week_8   week_9  week_10 
##        0        0        0        0        0        0        0        0 
##  week_11  week_12  week_13  week_14  week_15  week_16  week_17  week_18 
##        0        0        0        0        0        0        0        0 
##  week_19  week_20  week_21  week_22  week_23  week_24 
##        0        0        0        0        0        0
sapply(soil_data, function(x) sum(is.na(x)))
##                    species                   ecotr_id 
##                          0                          0 
##                    climate               Bulk Density 
##                          0                          0 
##               Infiltration              Soil Porosity 
##                          0                          0 
##                 Soil Depth     Water Holding Capacity 
##                          0                          0 
##               Soil Nitrate                    Soil pH 
##                          0                          0 
##                 Phosphorus                  Potassium 
##                          0                          0 
##                 Earthworms        Microbial Biomass C 
##                          0                          0 
##        Microbial Biomass N Particulate Organic Matter 
##                          0                          0 
##            Mineralizable N               Soil Enzymes 
##                          0                          0 
##           Soil Respiration       Total Organic Carbon 
##                          0                          0 
##                      pears 
##                          0
#check data type of each column

sapply(final_data, typeof)
##            ID      location      ecotr_id       tree_id       pear_id 
##   "character"   "character"      "double"      "double"      "double" 
##       species       climate quality_index 
##   "character"   "character"      "double"
# Convert 'climate' and 'species' to factors
final_data <- final_data %>%
  mutate(
    climate = as.factor(climate),
    species = as.factor(species),
    location = as.factor(location),
    ecotr_id = as.factor(ecotr_id),
    tree_id = as.factor(tree_id)
  )

# Add binary quality indicator (cut-off = 60)
final_data <- final_data %>%
  mutate(quality_b = ifelse(quality_index >= 60, 1, 0))

Main Research Question

How Does pear quality varies across the four climate scenarios and which scenario improves pear quality.

Explor Data

# 1️⃣ Distribution of Quality Index
ggplot(final_data, aes(x = quality_index)) +
  geom_histogram(binwidth = 5, fill = "#69b3a2", color = "black") +
  theme_minimal() +
  labs(title = "Distribution of Quality Index", x = "Quality Index", y = "Count")

# 2️⃣ Boxplot by Species
ggplot(final_data, aes(x = species, y = quality_index, fill = species)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Quality Index by Species", x = "Species", y = "Quality Index")

# 3️⃣ Boxplot by Climate Scenario
ggplot(final_data, aes(x = climate, y = quality_index, fill = climate)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Quality Index by Climate Scenario", x = "Climate", y = "Quality Index")

# 4️⃣ Interaction: Species × Climate
ggplot(final_data, aes(x = climate, y = quality_index, fill = species)) +
  geom_boxplot(position = position_dodge(width = 0.8)) +
  theme_minimal() +
  labs(title = "Interaction: Species × Climate", x = "Climate", y = "Quality Index")

# 5️⃣ Scatterplot: Quality Index by Tree
ggplot(final_data, aes(x = tree_id, y = quality_index)) +
  geom_point(color = "steelblue") +
  theme_minimal() +
  labs(title = "Quality Index per Tree", x = "Tree ID", y = "Quality Index") +
  theme(axis.text.x = element_text(angle = 90))

# 6️⃣ Mean ± SD plot by Species and Climate (useful for model design)
final_data %>%
  group_by(species, climate) %>%
  summarise(
    mean_quality = mean(quality_index, na.rm = TRUE),
    sd_quality = sd(quality_index, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  ggplot(aes(x = climate, y = mean_quality, group = species, color = species)) +
  geom_line() +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_quality - sd_quality, ymax = mean_quality + sd_quality), width = 0.2) +
  theme_minimal() +
  labs(title = "Mean ± SD of Quality Index", y = "Mean Quality Index", x = "Climate Scenario")

  • interaction effect was not significant so removed.
fixed effects
fixed effects
fixed effects
fixed effects

Exploratory Data Analysis

ggplot(final_data, aes(x = quality_index)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  geom_vline(xintercept = 60, linetype = "dashed", color = "red") +
  theme_minimal() +
  labs(title = "Distribution of Quality Index", x = "Quality Index", y = "Count")

ggplot(final_data, aes(x = factor(quality_b))) +
  geom_bar(fill = "steelblue") +
  theme_minimal() +
  labs(title = "Good vs. Poor Quality Counts", x = "Quality (0 = Poor, 1 = Good)", y = "Count")

final_data %>%
  group_by(climate) %>%
  summarise(GoodQualityRate = mean(quality_b)) %>%
  ggplot(aes(x = climate, y = GoodQualityRate, fill = climate)) +
  geom_col() +
  theme_minimal() +
  labs(title = "Proportion of Good Quality Pears by Climate", y = "Proportion", x = "Climate")

Seem Scenario 2 has more good quality pears followed by scenario 3

head(final_data)
## # A tibble: 6 × 9
##   ID       location ecotr_id tree_id pear_id species    climate    quality_index
##   <chr>    <fct>    <fct>    <fct>     <dbl> <fct>      <fct>              <dbl>
## 1 BE_1_1_1 BE       1        1             1 Conference Scenario 1            64
## 2 BE_1_1_2 BE       1        1             2 Conference Scenario 1            35
## 3 BE_1_1_3 BE       1        1             3 Conference Scenario 1            71
## 4 BE_1_1_4 BE       1        1             4 Conference Scenario 1            62
## 5 BE_1_1_5 BE       1        1             5 Conference Scenario 1            71
## 6 BE_1_1_6 BE       1        1             6 Conference Scenario 1            94
## # ℹ 1 more variable: quality_b <dbl>

Almost Balanced.

final_data %>%
  group_by(climate, species) %>%
  summarise(GoodQualityRate = mean(quality_b), .groups = "drop") %>%
  ggplot(aes(x = species, y = GoodQualityRate, fill = species)) +
  geom_col() +
  facet_wrap(~climate) +
  theme_minimal() +
  labs(title = "Good Quality Proportion by Species and Climate", y = "Proportion")

final_data %>%
  group_by(climate, species) %>%
  summarise(GoodQualityRate = mean(quality_b), .groups = "drop") %>%
  ggplot(aes(x = climate, y = species, fill = GoodQualityRate)) +
  geom_tile(color = "white") +
  geom_text(aes(label = scales::percent(GoodQualityRate, accuracy = 1)), color = "black", size = 4) +
  scale_fill_viridis_c(option = "C", begin = 0.2, end = 0.9) +
  theme_minimal(base_size = 13) +
  labs(
    title = "Heatmap of Good Quality Proportion",
    x = "Climate",
    y = "Species",
    fill = "Proportion\nGood Quality"
  )

library(dplyr)
library(tidyr)
library(ggplot2)

# Aggregate good and total pears per tree
tree_yield <- final_data %>%
  group_by(tree_id, climate, species) %>%
  summarise(
    total_pears = n(),
    good_quality_pears = sum(quality_index >= 60),
    .groups = "drop"
  ) %>%
  pivot_longer(cols = c(total_pears, good_quality_pears),
               names_to = "Count_Type",
               values_to = "Number_of_Pears")

ggplot(tree_yield, aes(x = climate, y = Number_of_Pears, fill = Count_Type)) +
  geom_boxplot(position = position_dodge(width = 0.75), width = 0.6) +
  facet_wrap(~species, nrow = 1, scales = "free_x") +
  scale_fill_manual(values = c("good_quality_pears" = "#4575b4", "total_pears" = "#e0ecf4"),
                    labels = c("Good Quality Pears", "Total Pears")) +
  labs(
    title = "Distribution of Total vs. Good-Quality Pears per Tree",
    subtitle = "By Climate Scenario and Species",
    x = "Climate Scenario",
    y = "Number of Pears",
    fill = "Count Type"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major.x = element_blank())

ggsave("pear_yield_plot.png", width = 12, height = 6, dpi = 300)
# Filter for only good quality pears
tree_yield_good <- tree_yield %>%
  filter(Count_Type == "good_quality_pears")

# Plot only good-quality pears
ggplot(tree_yield_good, aes(x = climate, y = Number_of_Pears, fill = Count_Type)) +
  geom_boxplot(position = position_dodge(width = 0.75), width = 0.6) +
  facet_wrap(~species, nrow = 1, scales = "free_x") +
  scale_fill_manual(values = c("good_quality_pears" = "#4575b4"),
                    labels = c("Good Quality Pears")) +
  labs(
    title = "Distribution of Good-Quality Pears per Tree",
    subtitle = "By Climate Scenario and Species",
    x = "Climate Scenario",
    y = "Number of Good-Quality Pears",
    fill = "Count Type"
  ) +
  theme_minimal(base_size = 16) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid.major.x = element_blank())

# Save
ggsave("good_quality_pears_plot.png", width = 12, height = 6, dpi = 300)
ggplot(tree_yield_good, aes(x = climate, y = Number_of_Pears, fill = species)) +
  geom_violin(alpha = 0.4, position = position_dodge(width = 0.9), trim = FALSE) +
  geom_boxplot(width = 0.2, position = position_dodge(width = 0.9), outlier.shape = NA) +
  geom_jitter(position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.9),
              alpha = 0.3, size = 1) +
  facet_wrap(~species, scales = "free_x") +
  labs(title = "Good-Quality Pear Yield Distribution by Climate and Species",
       x = "Climate Scenario", y = "Good-Quality Pears per Tree") +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Question 2

(b) Analyze the outcome ‘size’ of the pears over time, taking into account the covariates.

# Pivot to long format
data_long <- size_data %>%
  pivot_longer(
    cols = starts_with("week_"),
    names_to = "week",
    names_prefix = "week_",
    values_to = "size"
  ) %>%
  mutate(week = as.integer(week))
head(data_long)
## # A tibble: 6 × 7
##   pear_id tree_id ecotr_id species    climate     week  size
##     <dbl>   <dbl>    <dbl> <chr>      <chr>      <int> <dbl>
## 1       1       1        1 Conference Scenario 1     0   0  
## 2       1       1        1 Conference Scenario 1     1   0.6
## 3       1       1        1 Conference Scenario 1     2   0.8
## 4       1       1        1 Conference Scenario 1     3   1.2
## 5       1       1        1 Conference Scenario 1     4   2.3
## 6       1       1        1 Conference Scenario 1     5   3.1
# Average growth across all pears
ggplot(data_long, aes(x = week, y = size)) +
  stat_summary(fun = mean, geom = "line", color = "blue", size = 1) +
  stat_summary(fun.data = mean_se, geom = "ribbon", alpha = 0.2) +
  labs(title = "Average Pear Size Over Weeks", y = "Size", x = "Week")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

* data seems close to linear.

ggplot(data_long, aes(x = week, y = size, group = pear_id)) +
  geom_line(alpha = 0.3) +
  labs(title = "Individual Growth Trajectories of Pear size", y = "Size", x = "Week")

ggplot(data_long, aes(x = week, y = size, group = pear_id, color = species)) +
  geom_line(alpha = 0.6) +
  facet_wrap(~ climate) +
  labs(
    title = "Pear Growth Over Time by Scenario and Species",
    y = "Size",
    x = "Week",
    color = "Species"
  ) +
  theme_minimal()

head(data_long)
## # A tibble: 6 × 7
##   pear_id tree_id ecotr_id species    climate     week  size
##     <dbl>   <dbl>    <dbl> <chr>      <chr>      <int> <dbl>
## 1       1       1        1 Conference Scenario 1     0   0  
## 2       1       1        1 Conference Scenario 1     1   0.6
## 3       1       1        1 Conference Scenario 1     2   0.8
## 4       1       1        1 Conference Scenario 1     3   1.2
## 5       1       1        1 Conference Scenario 1     4   2.3
## 6       1       1        1 Conference Scenario 1     5   3.1
# Plot
ggplot(data_long, aes(x = week, y = size, group = pear_id, color = climate)) +
  geom_line(alpha = 0.2) +  # individual pear growth
  stat_smooth(aes(group = climate), method = "lm", se = FALSE, size = 1.2) +  # smooth lines
  facet_wrap(~ species) +  # split by species
  labs(title = "Pear Size Over Time by Climate Scenario and Species",
       x = "Week", y = "Size") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

* Conference shows slightly faster growth across all scenarios as compared to Doyenne Specie * Apparently, pears size in scenario one is larger from 15 weeks onward across both species.

# Compute average size per week per species and climate
avg_growth <- data_long %>%
  group_by(week, species, climate) %>%
  summarise(mean_size = mean(size, na.rm = TRUE), .groups = "drop")

# Plot
ggplot(data_long, aes(x = week, y = size, group = pear_id, color = species)) +
  geom_line(alpha = 0.3) +  # individual lines
  geom_line(data = avg_growth, aes(x = week, y = mean_size, group = species),
            size = 0.5, linetype = "solid") +  # average lines
  facet_wrap(~ climate) +
  labs(
    title = "Pear Growth Over Time by Scenario and Species",
    y = "Size",
    x = "Week",
    color = "Species"
  ) +
  theme_minimal()

Observation:

Suggested Model

As the intercept starts with 0 so we don’t need random intercept There is a within pears variability So we’ll include random slops in the model because we are interested in change of pear size over time (growth rate), which is change in slope.

  • Fixed effect Climate Scenarios & Species
  • Random Effect Pear Id
  • Our main concern it to check pear size (growth rate) over time.
  • The growth rate might varies across climate scenarios so **climate*week** (The effect of time (week) on pear size depends on climate averaged across species)
  • Also growth rate might varies across species **species*week**. (The effect of time (week) on pear size depends on species averaged across climate scenarios)
  • climatespeciesweek : This tests whether the pear size (growth rate) differs between species depends on climate scenarios. The species difference in growth rate (for example Species 1 grows faster than Species 2) might be stronger in some climates than others.
# let's export the data and prepare statistical model in sas instead of R
write.csv(data_long, "pear_growth_data.csv", row.names = FALSE)
head(data_long)
## # A tibble: 6 × 7
##   pear_id tree_id ecotr_id species    climate     week  size
##     <dbl>   <dbl>    <dbl> <chr>      <chr>      <int> <dbl>
## 1       1       1        1 Conference Scenario 1     0   0  
## 2       1       1        1 Conference Scenario 1     1   0.6
## 3       1       1        1 Conference Scenario 1     2   0.8
## 4       1       1        1 Conference Scenario 1     3   1.2
## 5       1       1        1 Conference Scenario 1     4   2.3
## 6       1       1        1 Conference Scenario 1     5   3.1

Model results and interpretation:

Pear Growth Chart
Pear Growth Chart
  • First let’s interpret non significant effect so that we can stick with the model without it.
  • The interaction between weekclimatespecies is highly non significant and effect size is very small, which shows the pear size between species doesn’t depends on climate change.
  • let’s fit the model after removing it and then interpret it

fixed effects fixed effects

Fixed Effects Interpretation

  • All climate scenarios show significant growth (p < 0.0001). Scenario one with the fastest growth as compared to scenario 2 and 3.
  • Conference pears outperform Doyenne consistently across all climates

Random Effects

  • Minimal variability between individual pears’ growth rates. Variance = 0.00204
  • Much Residual variance: High unexplained variability, suggesting other factors ( soil quality, water) may influence size.

Type 3 Tests of Fixed Effects

  • Conference grows significantly faster than Doyenne (F=481.78, p<.0001).
  • Overall effect: Growth rates differ significantly across climates (F=9.24, p<.0001).